Meridian Design Doc 6:
Introduction
MERidian stands for Measure, Evaluate, Reward, the three steps of the impact evaluator framework. The Meridian project aims to create an impact evaluator for “off-chain” networks. I.e. a network of nodes that do not maintain a shared ledger or blockchain of transactions.
This doc proposes a framework and model for Meridian that will cater initially for both the Saturn payouts system and the SPARK Station module. For a bonus point, it should be able to cater for any Station module. We believe that trying to generalise beyond these few use cases at this point may be counterproductive.
We will structure this design doc based on the three steps of an Impact Evaluator (IE), measure, evaluate and reward.
Overview
Setup
sequenceDiagram
autonumber
participant C as Client
participant I as Contract: IE
participant P as Peer
par Funding
loop
C->>I: Fund work
end
and Work
loop
P->>P: Perform work
end
and Impact Evaluator
loop
I->>P: Measure
I->>I: Evaluate
I->>P: Reward
end
endGeneralizability
What can Meridian implementors re-use?
- Impact evaluator smart contract
- measure service
- evaluate service scaffolding
What do Meridian implementors have to provide?
- Peer
- Deployments of measure and evaluate services
- evaluate service business logic (fraud detection, evaluation)
flowchart TD
MS[Meridian Measure Service]
P[*Peer] --measurement--> MS
MS --commitment--> IE[Meridian IE Contract]
subgraph ES [Meridian Evaluate Service]
E[*Evaluate function]
end
ES--evaluation-->IE
MS--measurement-->ES
IE--rewards-->PMeasure
sequenceDiagram
autonumber
participant P as Peer
participant M as Service: Measure
participant I as Contract: IE
participant E as Service: Evaluate
par Measure
loop
alt Peer uses optional measurement service
P->>M: Upload measurements
activate M
M->>M: Store measurements
M->>M: Aggregate measurements
M->>M: Expose via IPFS
M->>I: addMeasurements(measurementsCID)
deactivate M
else Peer self-submits
P->>P: Aggregate measures
P->>P: Expose via IPFS
P->>I: addMeasurements(measurementsCID)
end
activate I
I->>I: Store measurements
I->>I: Emit Measurement event
I->>I: Maybe advance round
deactivate I
end
and Evaluate: Preprocess
loop
E->>I: Await Measurement event
E->>E: Fetch measures via IPFS
E->>E: Detect fraud
E->>E: Aggregate
E->>E: Store aggregates
end
end
Evaluate
sequenceDiagram
autonumber
participant I as Contract: IE
participant E as Service: Evaluate
I->>I: Advance round
E->>I: Await Round event
E->>I: getRound(round-1)
E->>E: Await all measurements have been pre-processed
E->>E: Fetch aggregates
E->>E: Calculate reward shares
E->>I: setScores(round, scores, summary)
I->>I: Store scores
Reward
sequenceDiagram
autonumber
participant I as Contract: IE
participant P as Peer
I->>P: Send FIL
Decentralization of services
It is of note that while in addition to smart contracts some services are being used, the design is decentralized:
- The measure service is optional, and is hosted for peer convenience. Peers can decide not to use it, or run their own. Every peer is able to submit measurements themselves, but they need to take care of aggregating, pinning and gas cost.
- The evaluate service’s outputs will be reproduced by one or more parties, to verify correctness.
Implementations
SPARK x Meridian
SPARK currently requires centralized tasking, in order to implement fraud detection. proposes a decentralized tasker.
Whether a centralized tasker is a deviation from the impact evaluator architecture isn’t clear. On one hand, tasking might not be considered part of the IE. On the other hand, without the centralized tasker, peers won’t perform any work, and there is no impact to evaluate.
Saturn x Meridian
Measurement
- Saturn doesn’t want to expose the raw measurement logs to the world
- The IPF interface for transmitting measurements to the evaluate preprocess service might not work for Saturn’s traffic amounts
Evaluate
- Could we get another trusted party to run the evaluation pipeline, so that there’s not a single responsible party (making it centralized)?
Impact Evaluator
Implementation
See also
Round logic
Rounds are advanced based on configurable block number increments. The trigger happens inside the IE’s addMeasurement() function, as it is called frequently during the round.
Round lengths
Round length should be configurable without major scaling considerations, as the Evaluate: preprocess step takes care of pre-aggregating measurements. Likely more significant factors determining round length are peer incentives / feedback loop and gas cost. See also .
It is up to the Meridian implementer to decide how long their rounds shall be. Round lengths can also be adjusted while the IE is running. Adjustments will become effective at the start of next round.
Alternatives considered
- Advance after time. Similar to advancing based on block numbers, but harder to get right because it relies on clocks
- Advance after measurements. Harder to control, however easier to scale / operate.
Indexing
Rounds can indexed by their start block number and static index in the rounds array.
Payouts and funding
Per round a fixed amount of FIL will be distributed among peers. While this isn’t as incentivizing as allowing over-performers to claim more than anticipated, it helps control cost and fraud detection risk.
In order to incentivize peers to contribute to the system, the contract should be pre-charged a couple rounds in advance. This also helps decouple the IE from the team running it, as peers can be confident there will be rewards even if the team stops operating.
Measure
See describing components and flow
In the measure step, we refer to each atomic item that gets measured as a job. For example, each retrieval served by a Saturn node is a job. For Spark, each retrieval made from an SP is a job.
This step is implemented using 2 components:
- An opt-in measure service
- The impact evaluator smart contract
If opting in to use a hosted measure service, peers periodically submit measurements (job logs) to it, which commits batches to the impact evaluator smart contract, and exposes them over IPFS for the evaluate step.
If not opting in to using the measure service, it’s the peer’s responsibility to aggregate measurements, submit to the smart contract, and expose via IPFS.
Measurements need to be retrievable via IPFS until rewards have been paid for the round they were submitted in.
Since all measurements are publicly retrievable, the measure service doesn’t need to provide any proofs (of inclusion). If peers find a measure service to misbehave, they can use another, deploy their own, or submit directly.
Data Model
Measurements
The measure service (or peers directly) periodically submits CIDs to the impact evaluator smart contract.
- How to store large amounts of JSON blobs efficiently? UnixFS or just one node with a large string?
// For readability, these JSON objects have been pretty printed
// Generalized record
{
"job_id": "<UUID or CID>", // Unique job id
"peer_id": "<Libp2p Peer ID>", // Who completed the job
"started_at": "Timestamp", // When did the job begin
"ended_at": "Timestamp", // When did the job end
// Any other fields that are useful measurements of work done
}
// Example Saturn record
{
"job_id": "abcdef",
"peer_id": "<Libp2p Peer ID>",
"started_at": "2023-05-01 00:52:57.62+00",
"ended_at": "2023-05-01 00:52:58.62+00",
"num_bytes_sent": 240,
"request_duration_sec": 10,
"ttfb_ms": 35,
"status_code": 200,
"cache_hit": true
}
// Example SPARK record
{
"job_id": "abcdef",
"peer_id": "<Libp2p Peer ID>",
"started_at": "2023-05-01 00:52:57.62+00",
"ended_at": "2023-05-01 00:52:58.62+00",
"status_code": 200,
"signature_chain": "<signature chain>",
"num_bytes": 200,
"ttfb_ms": 45
}Implementation
Evaluate
At this point we have an array of CIDs stored in the impact evaluator’s current round, pointing at measurements exposed via IPFS. The next step is to evaluate over a round’s measurements, using the evaluation function.
Evaluation Function
In general, for evaluation fields, for node where , and with logs and evaluations on those logs and evaluation function , we can calculate the evaluation output as
where and is the evaluation of node .
In the case of Saturn, the evaluation function is a function of number of bytes sent, TTFB and the request duration. This is calculated by the Saturn payouts system. See https://hackmd.io/@cryptoecon/saturn-aliens/%2FMqxcRhVdSi2txAKW7pCh5Q for more details.
In the case of Spark, the evaluation function is simply a count of the number of successful requests with valid signature chains a Station has performed. Specifically, for node ,
where if the log with index of node is valid and otherwise.
Reproducibility
- In order to decentralize the Evaluate service, it can be run by multiple parties. My first thought was to let the PL-run Evaluate service submit evaluations to chain. The other services can then publish their results off-chain, to confirm that the evaluations are correct. This has the downside that the contract only trusts one party, and aside from reputation the other parties can’t change anything about the contract’s operation.
- Another idea would be to let all evaluating parties submit their results to the contract. Only once a certain quorum of equal results is reached, the contract will trigger the rewards phase. This seems more decentralized, but still has the downside that a conflict needs to be resolved by a PL admin
Multi stage evaluation
In Meridian, evaluation is a 2 stage process. Evaluation stage i is a data preprocessing pipeline, that periodically pre-filters and aggregates measurement results. This smaller dataset is then consumed by evaluation stage ii, which is executed once before the rewards phase.
The 2 stage design is one of the lessons from the Saturn project: Data needs to be aggregated and pre-filtered, as otherwise e.g. a once-a-month evaluation run will operate over too large a dataset and pose serious scaling issues.
Conveniently, fraud detection is also a 2 stage process, and each evaluation stage comes with one fraud detection stage.
Evaluation Stage I: Data preprocessing
See describing components and flow
The data preprocessing pipeline is executed whenever a measure CID has been committed to the impact evaluator smart contract. The pipeline retrieves raw measurements via IPFS, performs its preprocessing steps, and finally stores results in its internal data store.
Fraud Detection
Based on the network-specific fraud detection function, measurements are aggregated into two buckets:
- Honest measurements: Data used for later processing in evaluation stage ii and reward
- Fraudulent measurements: Data kept for reference
The fraud detection function maps an individual measurement to boolean fraudulent status:
flowchart TD
subgraph Measurements
M1[Measurement]
M2[Measurement]
M3[Measurement]
end
subgraph Buckets
BH[Honest]
BF[Fraudulent]
end
Measurements--detect fraud-->Buckets
M1[Measurement] --> BH
M2[Measurement] --> BH
M3[Measurement] --> BF
Aggregation
Measurements from both buckets will be aggregated, and those aggregations stored in internal data storage.
However, only measurements from the Honest bucket will count when evaluation stage ii determines the peer’s impact on the system.
flowchart TD
subgraph Aggregates
AH[Honest]
AF[Fraudulent]
end
subgraph Database
TH[Honest]
TF[Fraudulent]
end
AH --> TH
AF --> TF
AH --> Evaluation
Evaluation Stage II
See describing components and flow.
After each round, the evaluate stage ii service converts preprocessed aggregated measurements (from the Honest bucket) into evaluation results.
It also executes a 2nd round of fraud filtering.
By committing scores to the impact evaluator smart contract, evaluations are committed on chain and the reward phase will begin.
The evaluation process runs off chain, because the dataset (all aggregated measurements produced by evaluation stage i) is too large to be handled by smart contracts.
Fraud detection
Multiple processes can mark peers or logs as fraudulent, in between measure and evaluate. For example, the Saturn Orchestrator can mark a peer as fraudulent when it fakes its speed test results.
Therefore, all measurements that are part of the Honest buckets but have later on been flagged as fraudulent (or associated with a peer that has been flagged) will not be fed into the evaluation function.
Data Model
At the end of evaluation stage ii data of following shape is committed on chain by calling the setScores function:
- round: The index of the evaluated round
- addresses: The addresses of the peers involved in the round
- scores: The impact scores for each peer involved in the round, which maps to a share of the round’s reward pool
Reward
In the reward step, the impact evaluator smart contract directly sends peers their reward share, as determined by the previous step.
sequenceDiagram
autonumber
participant I as Contract: IE
participant P as Peer
I->>P: Send FIL
Push vs pull payments
Pull payments have advantages:
- The system doesn’t have to pay the transaction gas fees
- Peers have the freedom to claim whenever they want (e.g. tax benefits), or not to claim at all
- They are potentially less coupled (legally) to the party funding the payments
However, we decided to go with push payments, for these reasons:
- They allow us to change round length freely. With short rounds, peers claiming rewards for each individual round would be cumbersome
- The smart contract logic is significantly easier
- Clients (namely Filecoin Station) don’t need to implement claiming functionality
- Clients don’t need to have an existing balance (for paying gas fees) in order to receive rewards
Smart Contracts
Depending on how complex contracts turn out to be, hire contractors or write ourselves. Our current thinking is that contract work will be simple enough (either because contracts are simple or existing contracts can be reused), that we would prefer to write the contracts ourselves. This puts is more in control, and alleviates timelines / cross team orchestration.
Independent of which team creates the contracts, audits will be required anyway.
Specs
Testing
Testing will depend on the choice of framework we use to develop the smart contracts. The recommendation is that we proceed with foundry because it has built in invariance testing and auto generates rust bindings for the contracts which are useful for integration tests.
Smart Contract Testing
- We can write some unit tests in solidity that test basic contract functions such as claiming a payout, etc. With foundry we can write these tests in solidity.
- Invariance testing. This is built in foundry or can be done with a separate library such as echidna. This implements fuzz testing to the contract as a whole.
- Static analysis. There are already established analysis tools for EVM smart contracts and we should use them.
One caveat with testing FVM smart contracts is that if we want to use filecoin specific features (eg. Filecoin addresses) in our contracts then we would be relying on using filecoin pre-compiles and that will break a lot of testing libraries. A naive solution could be to maintain two versions of each contract. The Saturn team also started working on a local FVM test executor written in Rust that allows to run unit tests on solidity smart contracts that use filecoin precompiles. This executor is still rudimentary and needs improvement to be a reliable testing tool.
Unit Tests
Each component should have unit tests to make sure functionality is working. For example we should have extensive unit tests for: log commitment scheme, evaluation functions, etc.
Integration Tests
If we have bindings for the contracts, we can easily write integration tests for some end to end flows that run on calibration net. This just requires a burner wallet with some test fil in it. Saturn already has examples of this.
Auditing
After we complete our smart contracts, we should have them audited and publish the audit publicly.
Observability
Take inspiration from the Saturn internal dashboard.
Create a generalized dashboard template for all Meridian systems.
SPARK x Meridian roadmap
SPARK will be the first Meridian implementation. The use case of SPARK will be used to create the reusable infrastructure (services & smart contracts) that Meridian will offer to future implementors.
Quality criteria
- For each service or smart contract
- Testsuite
testnetdeployment
mainnetdeployment
- For each smart contract
- Audit
- Static analysis
- For each service
- Observability through Sentry & Grafana
No walking skeleton
While it is a popular pattern (and one liked by the team) to first create a walking skeleton with all the components of a software system in it’s most basic form talking to each other, it is not a great fit for Meridian.
- The sooner the finished measure step is deployed, the sooner we will collect real data that will later be consumed by the following steps. Therefore, developing the steps in parallel will shift the timeline unfavourably
- The steps have clear boundaries with well enough defined Interfaces, thanks to the theoretical foundation of the Impact Evaluator Framework
- A deployed measure step can already collect data, while a deployed yet unfinished evaluate step shouldn’t start evaluating them
- Sequential flow of development fits sequential flow of system
The team is therefore going to implement measure, evaluate and reward in the traditional waterfall model. One could also argue that the result of this document’s roadmap will be just this walking skeleton.
| Item | DRI | Notes |
|---|---|---|
| Interface with legal | PM | |
| Business model exploration | PM | |
| Boost interface | Eng Lead | |
| Smart contract contractors | PM | Contractor? |
| Smart contract auditors | PM | Contractor |
| Measure Eng work | Eng Lead | |
| Evaluate Eng work | Eng Lead | |
| Reward Eng work | Eng Lead | |
| Meridian Website | PM | Contractor |
| Update Station Website | PM + Contractor | Contractor |
| Lab week planning | PM | |
| Swag | PM | |
| Product market fit work | PM |
Notes 2023-08-03
Need a smart contract that is running the whole loop
Want it to operate as close as you can to how block rewards operate
Structure on chain that nobody controls
Entity that nobody controls that is on chain, that runs the process of rewarding
Look at Juan’s workshops into how contract structure should work
Taking blockchain reward model and not change it at all
Overall smart contract for the IE
Per round
Sampling steps into the SP to find a CID
Orchestrator should sample CIDs
There is no list of CIDs anywhere
Flesh out the e2e structure of Meridian and deploy a version of it, even if measurement and rewarding isn’t as good as it should be.
For small amounts of reward, it wont be worth doing fraud
Honest vs fraudulent log classifier
Block reward model has a concrete time structure
Next steps
- Clean up document
- Start document: Round lengths
- Start document: IE PaaS deployment
https://github.com/filecoin-station/meridian-measure-service/tree/main